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1.   INTRODUCTION 


There  are  few  things  more  frustrating  than  spending  a 
great  deal  of  time  debugging  syntax  errors  in  a  program. 
Often,  an  error  causes  a  compiler  to  make  an  incorrect 
assumption  which  leads  to  a  confusing  error  message  as  well 
as  to  the  generation  of  many  additional  error  massages.  The 
final  recovery  often  necessitates  the  skipping  of  large 
portions  of  the  input  string.  This  means  that  any 
additional  errors  that  were  skipped  over  will  go  undetected 
until  future  runs  of  the  program. 

A  good  syntactic  error  recovery  method  should  detect 
and  givp  an  accurate  and  meaningful  error  message  far  each 
error  in  a  program.  This  means  completely  recovering,  and 
resuming  the  parse  at  the  point  of  each  error,  so  as  not  to 
miss  detecting  any  subseguent  errors.  The  advantages  of  a 
compiler  having  a  good  error  recovery  method  are  obvious. 
The  disadvantages  are  that  it  may  be  either  very  costly  to 
develop  or  very  inefficient  to  use.  An  automatically 
generated  error  recovery  method  solves  the  first  of  these 
two  problems. 


1.  1  Survey  of  Previous  Work 


The  following  is  a  brief  survey  of  work  done  on 
automatically  generated  error  recovery  schemes.  More 
detailed  surveys  arp  contained  in  LaFrance  [8]  and  in  Rhodes 

r  i2i. 

The  simplest  error  recovery  scheme  is  commonly  referred 
to  as  "panic  mode".  Upon  detectinq  an  error,  the  parser 
enters  its  "panic  mode"  and  discards  symbols  from  the  input 
strinq  until  one  is  encountered  that  belongs  to  a  set  of 
special  symbols.  The  parser  then  backs  up  on  the  parsing 
stack  to  a  point  where  that  special  symbol  is  a  legal  input 
symbol.  This  method  does  not  qualify  as  a  good  error 
recovery  scheme.  It  is  an  extremely  fast  method,  hut  may 
skip  over  large  portions  of  the  input  string  and  its  error 
messages  generally  consist  only  of  an  indication  of  the 
inout  symbol  on  which  the  error  was  detected. 


In  1963,  Irons  [5]  published  an  automatically  generated 
error  recovery  method  which  he  had  developed  and  implemented 
for  a  non-backtr ack  top-down  parsing  algorithm.  In  1970, 
Leinius  f  9  "J  described  but  did  not  implement  a  scheme  which 
appears  to  be  a  more  sophisticated  version  of  the  "panic 
mode"  method.  Leinius1  method  is  based  primarily  on  a 
simple   precedence   parsing   algorithm   [ 14 ],   but   he   also 


discusses  the  application  of  his  method  to  LR  parsing 
alqorithms  f  1  "|.  In  1972,  L.  R.  James  [6]  implemented 
Leinius'  error  recovery  ideas  on  an  LALR(k)  parser.  In 
1971,  LaFrance  [8  1  developed  and  implemented  an  automatic 
error  recovery  method  as  part  of  a  translator  writinq  system 
which  produces  a  Floyd-Evans  production  parser.  The 
LaFrance  method  produces  qood  results,  but  it  is  restricted 
by  a  bounded  look-ahead  and  by  the  fact  that  it  does  not 
attempt  to  modify  the  parsing  stack.  Also  in  1971,  Levy 
f 1 0 1  proposed  but  did  not  implement  an  automatic  error 
recovery  scheme.  Levy's  scheme  has  both  unbounded 
look-ahead  and  the  ability  to  modify  the  parsing  stack. 
However,  it  may  run  into  combinatorial  problems,  which  makes 
its  practicality  questionable.  Tn  1972,  Partridge  [11] 
developed  and  implemented  an  automatic  error  recovery  system 
that  has  the  ability  to  collect  statistics  on  the  programs 
run  through  it  and  to  accordingly  modify  itself  somewhat  in 
order  to  increase  its  efficiency.  Partridge's  method  does 
entail  a  considerable  amount  of  overhead,  even  while  parsing 
correct  portions  of  programs.  Tn  1971,  Rhodes  [12] 
developed  a  good  automatic  error  recovery  method  for  simple 
precedence  parsers.  Rhodes  implemented  his  method  for  both 
full  Pascal  [13]  and  for  a  subset  of  Algol  W.  Rhodes' 
method  is  effective  as  well  as  being  efficient  to  use.  A 
paper   by  Graham  and  Rhodes  [3]  contains  a  brief  overview  of 


simple  precedence  parsing  and  error  detection  in  simple 
precedence  parsers,  as  well  as  a  description  of  Rhodes1 
method. 


1.2  Overview  of  Thesis 
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The  motivation  for  this  work  came  from  reading  of 
Rhodes'  method  in  the  paper  by  Graham  and  Rhodes  [3].  The 
original  idea  was  to  develop  an  error  recovery  scheme  that 
would  extend  Rhodes'  ideas  to  LR  parsers  [7],  Note  that  the 
reader  is  presumed  to  have  a  knowledge  of  LR  parsing 
techniques  [  1  "].  Section  2  contains  a  description  of  Rhodes' 
error  recovery  method.  Rhodes'  ideas  proved  to  be  difficult 
to  apply  directly  to  LR  parsers  and  some  of  our  original 
ideas  changed  during  the  course  of  this  work.  Section  3 
describes  the  evolution  from  the  original  ideas  to  the 
method  which  was  finally  adopted.  Section  4  consists  of  a 
detailed  explanation  of  the  error  recovery  scheme.  Section 
5  describes  its  implementation  and  discusses  the  results  of 
some  samole  programs  run  on  it.  Section  6  evaluates  the 
effectiveness  of  the  error  recovery  method,  gives  an  idea  of 
its  efficiency  both  in  terms  of  memory  and  execution  time, 
and  discusses  the  ease  with  which   it   can   be   implemented. 


Section   7   suggests  some  improverae  nts  and  restrictions  that 

could  be  placed  on  the  error  recovery  method   in   order   to 

improve   its   efficiency,  and  presents  some  conclusions  that 

can  be  reached  about  the  effectiveness  and   practicality   of 
th<*  method. 


2.   RHODES'  METHOD 


2.  1  General  Description 
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Rhodes'  method  is  an  automatic  recovery  method  without 
a  fixed  bound  on  lookahead.  It  determines  the  most  likely 
correction  by  using  cost  vectors,  which  indicate  a  cost  for 
the  insertion,  deletion,  and  replacement  of  both  terminals 
and  nonterminals. 

The  method  consists  basically  of  two  parts.  The  first 
is  a  Condensation  Phase,  where  an  attempt  is  made  to 
localize  the  occurrence  of  the  error,  and  the  second  is  a 
Correction  Phase,  where  an  attempt  is  made  to  determine  what 
changes  are  necessay  to  correct  the  error. 


2.2  Condensation  Phase 


H  ?1  is  used  to  mark  the  point   of   the   error   at   the 
juncture  of  the  parsing  stack  and  the  input  string. 


The  Condensation  Phase  consists  of  both  a  backward  move 
and  a  forward  move.  For  the  backward  move,  ths  ?1  is 
assumed  to  be  a  >  simple  precedence  relation  [14],  which 
means  that  the  symbol  on  the  top  of  the  stack  has  qreater 
precedence  than  the  input  symbol  on  which  the  error  was 
detected.  The  backward  move  consists  of  making  all  possible 
reductions  to  the  left  of  the  ?1,  as  long  as  the  result 
forms  a  Drefix  of  a  valid  right-part  (RP)  of  a  production  of 
the  grammmar  and  a  precedence  relation  holds  between  the 
symbol  to  the  left  of  the  prospective  RP  and  the 
corresponding  left-part  (LP) . 


For  the  forward  move,  the  ?1  is  assumed  to  be  a  <• 
simple  precedence  relation,  which  means  the  input  symbol 
will  be  shifted  onto  the  stack.  The  forward  move  consists 
of  continuing  the  parse  to  the  right  of  the  ?1  until  another 
error  is  detected  (which  is  marked  by  a  ?2  on  the  parsing 
stack).  A  second  error  will  always  be  detected  by  the  time 
the  parser  attempts  to  reduce  over  the  ?1,  since  the  ?1 
cannot  be  contained  in  a  valid  RP.  At  this  point  the  ?2  is 
assumed  to  be  a  •>  simple  precedence  relation  and  all 
possible  reductions  are  made  using  the  symbols  to  the 
immediate  left  of  the  ?2  in  the  same  manner  as  for  the  ?1  in 
the  backward  move. 
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This  completes  the  condensation  phase.  At  this  point 
the  error  recovery  method  continues  to  the  Corrrection 
Phase,  assuming  that  the  error  has  been  localized  to  a 
section  of  the  parsinq  stack  bounded  on  the  left  by  the 
first  «  simple  precedence  relation  to  the  left  of  ?1  and 
bounded  on  the  riqht  by  the  ?2. 


2. 3  Correction  Phase 
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The  Correction  Phase  assumes  the  error  is  contained  in 

the   localized  area  determined  by  the  Condensation  Phase  and 

considers   three   possible   substrings   of   that    area  as 
candidates  for  chanqe. 


?1 


?2 


left  bound 
Candidate  *1-  The  first  4   to  the  left  of  ?1 
Candidate  #2-  ?1 
Candidate  #3-  The  first  «  to  the  left  of  ?1 


?1 

?2 
?2 


The  three  possible  substrings  are  pattern  latched 
against  the  RP's  of  all  productions  of  the  grammar  whose 
corresponding  LP's  have  precedence  relations  with  both  the 
symbols  to  the  left  and  to  the  right  of  the  substrings.  By 
usinq  the  costs  obtained  from  the  Insertion,  Deletion,  and 
Reolacement  Vectors,  a  cost  is  computed  for  each  attempted 
pattern  match,  and  the  solution  with  the  minimum  cost  is 
used. 
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3.   EVOLUTION  OP  THE  ERROR  RECOVERY  HETHOD 


The  oriqinal  idea  was   to   develop   an   error   recovery 
method  that  would  extend  Rhodes'  ideas  to  LR  parsers. 
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The  problem  applyir.q  Rhodes*  method  directly  to  LR 
parsers  is  that  it  is  not  easy  to  determine  the  possible 
left  and  riqht  ends  of  the  RP  as  Rhodes  has  so  neatly  done 
with  the  precedence  parser.  A  set  of  possible  riqht  ends  of 
the  RP  can  be  determined  by  continuinq  all  possible  parses 
in  parallel  from  the  point  of  error  detection  on  (one 
possible  parse  for  each  state  from  which  the  ?rror  symbol 
may  be  leqally  read) ,  until  each  possible  parse  either 
encounters  another  error  or  attempts  to  make  a  reduction 
which  extends  past  the  top  of  the  stack,  as  it  existed  when 
the  error  was  detected.  (We  call  this  "reducing  over  the 
error  point".)  Then  for  each  possible  parse,  the  riqht  end 
of  the  RP  is  the  point  at  which  that  parse  attempted  to 
reduce  over  the  error  point. 


The  problem  then  becomes  choosing  the  most  likely 
possible  parse,  which  need  not  necessarily  be  amonq  those 
indicated.  If  the  input  symbol  at  the  error  point  is  a 
completely   erroneous   symbol   which  should  be  deleted,  then 
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all  of  the  indicated  parses  are  incorrect  and  a  new  set  of 
possible  parses  will  have  to  be  discovered.  The  frequency 
of  occurence  of  an  error  due  to  an  erroneous  symbol,  that  is 
detected  immediately,  is  affected  by  the  similarity  of  the 
constructs  of  the  language  as  well  as  the  programs  being 
run.  An  idea  of  this  frequency  of  occurence  would  be  useful 
in  planning  a  strategy  for  determining  the  set  of  possible 
parses  to  start  with.  If  erroneous  symbols  frequently 
occured  and  usually  were  immediately  detected,  then  it  would 
be  more  efficient  to  skip  the  input  symbol  at  the  point  of 
error  detection  and  use  the  immediately  following  symbol  to 
set  up  the  possible  parses. 


This  still  leaves  the  problem  cf  finding  the  left  end 
of  the  RP  for  each  possible  parse.  This  problem  could  be 
deferred  until  the  pattern  match  by  using  a  right-biased 
right  to  left  pattern  match  similar  to  the  left-biased  left 
to  right  pattern  match  used  in  Rhodes*  method.  For  each 
possible  parse,  the  target  of  the  pattern  match  would  be  the 
FP  of  the  production  that  the  parser  was  attempting  when  it 
tried  to  reduce  over  the  error  point.  The  symbols  implied 
bv  the  states  on  the  parsing  stack  cf  each  possible  parse 
would  be  matched  against  the  corresponding  target  RP. 
However,  merely  matching  a  target  RP  without  accumulating  a 
cost  too  large  to  be  acceptable  would  not  guarantee  success. 
A  chr'ck  for  correct  left  context  must  also  be  made. 


12 

A  problem  is  presented  by  the  fact  that  the  error  may 
have  caused  a  reduction  that  otherwise  would  not  have 
occured  or  the  error  may  have  prevented  a  reduction  from 
occuring  that  otherwise  would  have.  Making  all  possible 
reductions  at  the  error  point,  regardless  of  the  next  symbol 
in  the  input  string,  would  be  the  LB  parser  equivalent  of 
the  Condensation  Phase  in  Rhodes'  method,  and  it  was  thought 
that  performing  such  reductions  would  provide  some  help. 


However,  the  real  problem  is  in  not  knowing  the  exact 
location  on  the  parsinq  stack  of  the  left  end  of  the  RP 
beinq  looked  for.  In  fact,  an  incorrect  reduction  may  have 
previously  occured  in  the  vicinity  of  the  left  end,  and  the 
exact  left  °rd  may  no  lonqer  exist  on  the  parsinq  stack. 
Rhodes*  method  did  not  seem  to  be  bothered  by  the 
possibility  of  reductions  occurinq  that  should  not  have  or 
reductions  not  occurinq  that,  should  have.  For  the  LR  parser 
though,  this  oroblem  combined  with  the  inability  to 
accurately  determine  the  left  end  of  the  RP,  would  most 
likely  cause  much  poorer  results  than  Rhodes*  method  did  for 
the  precedence  parser.  Whenever  necessary,  the  pattern 
match  could  expand  the  nonterminals  implied  by  the  states  on 
the  parsing  stack  into  their  possible  RP*s.  Also,  if 
necessary,  a  series  of  symbols  that  form  a  RP  could  be 
condensed  into  their  corresponding  LP*s. 
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However,  expandinq  and  condensing  symbols  while  pattern 
matchinq  leads  to  a  very  large  number  of  possibilities. 
Durinq  the  pattern  match,  each  time  a  symbol  did  not  match, 
a  check  would  have  to  be  made  for  all  possible  expansions  of 
the  symbol  as  well  as  all  possible  nonterminals  which  the 
symbol  could  be  an  expansion  of. 

Another  problem  to  consider  is  that  the  possible 
expansions  of  nonterminals  are  only  possibilities.  Once  a 
reduction  into  a  nonterminal  is  made,  there  is  no  way  of 
knowina  which  RP  reduced  into  that  nonterminal.  Thus  there 
is  no  way  to  be  sure  what  the  oriqinal  input  strinq  looked 
like.  This  means  the  cost  of  deletinq  a  nonterminal  called 
<expression>  would  have  to  be  the  same  reqardless  of  whether 
it  originally  consisted  of  a  single  identifier,  or  of  a  very 
larqe  arithmetic  expression.  Another  example  of  this 
problem  would  be  in  a  lanquage  with  the  productions 

<left-part>  =>   <identifier>  :=  <left-part> 
=  >   <identifier>  :  = 


where  the  symbol  :=  is  an  assignment  operator.  In  this 
case,  deleting  an  identifier  and  an  assignment  operator 
(i.e.,  "A:=")  which  reduced  into  the  nonterminal  <left-part> 
would  havo  the  same  cost  as  deleting  a  <left-part>  which 
oriqinallv  consisted  of  multiple  assiqnment  statements 
(i.e.,  "A:  =  B:=C:=D:  =  E:  =  ")  . 
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All  of  the  above  considerations  lead  to  the  decision  to 
save  the  original  input  string  in  tokenized  form,  and  only 
consider  insertions  and  deletions  of  terminals.  Besides 
guaranteeing  that  the  original  input  is  precisely  known, 
this  makes  error  messages  much  clearer  since  they  refer  to 
specific  terminal  symbols  instead  of  nonterminals  in  the 
grammar.  This  is  especially  helpful  to  the  user  who  would 
have  no  idea  of  what  a  nonterminal  was.  This  is  important, 
since  the  inexperienced  programmer  is  precisely  the  type  of 
user  who  would  derive  the  most  benefit  from  a  thorough 
syntactic  error  recovery  method. 

Although  the  idea  of  pattern  matching  to  the  closest  RP 
worked  well  and  efficiently  for  precedence  parsers,  it 
cannot  easily  be  adapted  to  LR  parsers.  Also  to  be  taken 
into  consideration  is  that  the  states  in  an  LR  parser 
contain  more  information  about  the  previous  input  than 
simply  the  symbol  on  which  they  were  entered.  This 
additional  information  would  not  be  utilized  by  merely 
pattern  matching  against  the  symbols  the  states  were  entered 
on. 


The  idea  of  pattern  matching  to  the  closest  RP  was 
finally  abandoned.  The  method  used,  which  is  described  in 
the  next  section,  attempts  to  use  the  information  contained 
in   the  states  to  make  each  possible  parse  continuous  across 
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its  prror  point.  The  method  works  with  the  tokenized  input 
string  and  only  considers  insertions  and  deletions  of 
terminals. 
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4.   THE  ERBOR  RECOVERY  METHOD 


4. 1  General  Description 


The  method  is  to  continue  a  series  of  possible  parses 
in  parallel  from  the  point  of  error  detection  on.  There  is 
no  fixed  bound  on  how  far  thpy  may  continue.  Thay  all 
proceed  until  either  encounterinq  another  error  or 
attemptinq  to  reduce  over  the  error  point.  Any  possible 
parse  attemptinq  to  reduce  over  the  error  point  is 
considered  to  be  a  candidate  for  correction.  Insertions  and 
deletions  of  terminals  are  considered  for  each  candidate  for 
correction  in  an  attempt  to  make  them  continuous  parses 
across  the  error  point.  There  is  an  Insertion  Cost  Vector 
ard  a  Deletion  Cost  Vector,  which  respectively  contain  the 
insertion  and  deletion  costs  of  each  terminal  in  the 
lanquaqe.  A  record  is  kept  of  the  individual  chanqes  and 
the  total  cost  of  all  the  chanqes  for  each  candidate  until 
either  the  possible  parse  is  corrected  or  the  total  cost 
exceeds  some  fixed  limit.  All  candidates  that  are  corrected 
without  accumulatinq  a  total  cost  qreater  than  the  limit  are 
considered  as  possible  solutions  and  are  printed  as  such. 
The  error  recovery  method  then  indicates  its  choice   of   the 


17 

most   probable   of   th^se   possible   solutions,  which  is  the 
solution  with  the  lowest  total  cost. 


4. 2  Setup  of  Possible  Parses 


A  possible  parse  is  set  up  for  each  state  for  which  the 
parser  action  would  be  a  "shift"  given  the  input  symbol  at 
the  point  of  error  detection.  It  is  not  necessary  to  set  up 
possible  parses  for  the  states  which  indicate  a  "reduce" 
parser  action  on  the  error  input  symbol.  States  from  which 
the  parser  action  would  be  a  reduction  are  considered 
ancestral  to  the  states  which  could  result  from  that 
reduction.  The  error  recovery  method  is  aware  of  this 
intended  ancestral  relation  when  it  is  working  with  the 
candidates  for  correction.  Therefore,  for  this  error 
recovery  method,  setting  up  one  possible  parse  for  each 
state  that  would  shift  on  the  input  symbol  is  sufficient  to 
cover  all  possible  cases  where  the  input  symbol  would  be 
leqal. 


A  ?1  is  assumed  to  exist  at  the  discontinuous  point  in 
each  possible  parse,  which  is  the  point  between  the  state  on 
which  the  error  was  detected  and  the  state  that  was   assumed 
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correct  when  setting  up  that  possible  parse.   While  parsing, 
each  possible  parse  is  aware  of  where  its  ?»s  are. 


4.3  Parallel  Parsing 

All  of  the  possible  parses  proceed  in  parallel.  Each 
one  makes  all  possible  reductions,  but  as  soon  as  it  makes  a 
shift,  control  is  passed  to  the  next  possible  parse.  Any 
parse  encountering  an  error  is  discarded  and  the  remaining 
parses  continue.  Upon  attempting  to  reduce  over  the  ?1,  a 
possible  parse  is  considered  as  a  candidate  for  correction. 
If  the  correction  attempt  fails  then  the  posssible  parse  is 
discarded.  If  the  correction  attempt  succeeds,  then  the 
corrections  are  tentatively  recorded,  and  the  process 
continues  as  long  as  at  least  one  possible  parse  has  not 
attempted  a  reduction  over  the  ?1.  This  continues  either 
until  all  possible  parses  have  been  discarded,  or  until  all 
those  remaining  have  successfully  provided  a  correction. 

For  the  case  where  the  process  ends  with  all  the 
remaining  parses  having  been  corrected,  the  corrections  that 
were  tentatively  recorded  for  each  of  the  remaining  parses 
are   printed  as  possible  solutions.   A  decision  is  then  made 
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as  to  the  most  probable  solution,  the  possible  parse  of  that 
solution  is  indicated,  and  the  error  recovery  routine  is 
exited.  Usually  the  error  recovery  routine  returns  to  the 
regular  parser.  However,  it  could  return  from  a  recursive 
call  of  itself.  The  conditions  under  which  the  error 
recovery  routine  recursively  calls  itself  are  discussed  in 
the  next  paragraphs.  Regardless  of  where  the  error  recovery 
routine  returns,  or  how  many  of  the  possible  solutions  had 
costs  egual  to  the  minimum,  only  one  solution  is  returned. 

For  the  case  where  all  of  the  possible  parses  have  been 
discarded,  there  are  still  two  possibilities.  The  input 
symbol  that  was  originally  assumed  correct  in  setting  up  the 
possible  parses  can  be  deleted,  or  the  error  recovery 
routine  can  assume  that  a  second  error  caused  all  the  parses 
to  be  discarded  and  can  call  itself  recursively  in  an 
attempt  to  independently  correct  the  second  error. 


Not  all  of  the  possible  parses  discarded  are 
irretrievable.  Those  that  became  candidates  for  correction, 
but  couia  not  correct  the  error  within  the  cost  limit  are 
comoletely  disregarded,  as  are  all  parses  that  encounter  a 
second  error  within  one  symbol  of  the  original  error. 
However,  for  each  pass  through  the  possible  parses,  until  at 
least  one  of  the  possible  parses  has  been  able  to  perform  a 
shift,   a   pointer   is   kept   to   each   parse  discarded  that 
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encountered  its  error  two  or  more  symbols  after  the  original 
error.  Whenever  all  the  possible  parses  have  been 
discarded,  but  two  or  more  symbols  have  been  shifted  since 
the  possihle  parses  were  set  up,  it  is  assumed  that  the 
oriqinal  input  symbol  assumed  in  setting  up  the  possible 
parses  may  still  be  correct  and  that  the  parses  ware  all 
discarded  because  of  detecting  a  second  error  in  the  input 
string.  By  re-instating  those  parses  that  encountered  their 
second  error  on  the  current  input  symbol  and  then  calling 
itself  recursively,  the  error  recovery  routine  attempts  to 
tackle  this  new  error  independently  of  the  first  error. 

If  all  of  the  possible  parses  have  been  discarded 
before  shifting  two  or  more  symbols,  then  the  input  symbol 
that  was  oriainally  assumed  to  be  correct  is  permanently 
deleted,  and  an  error  message  to  that  effect  is  printed  at 
this  point.  The  error  recovery  routine  then  restarts 
itself,  setting  up  new  possible  parses  using  the  symbol 
immediately  following  the  one  that  was  just  deleted. 

There  is  one  other  situation  that  could  arise.  A 
possible  parse  could  attempt  to  reduce  over  the  error  point, 
enter  the  Correction  Phase,  supposedly  correct  the  error, 
but  then  be  unable  to  reparse  up  to  the  point  in  the  input 
string   where   the   reduction   over   the   error   point    was 
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attempted.   An  example  of  this  is   shown   by   the   following 
statements. 


x    :=  (  y  ♦  (  z*5  ; 

2  :=  z  +  1 ; 

An  apparent  solution  is  to  insert  two  right  parenthesis. 
The  error  is  detected  when  the  ";"  is  the  input  symbol.  The 
possible  parses  "shift"  the  ";"  before  attempting  to  reduce 
over  the  error  point.  (Notice  that  according  to  the  grammar 
f Appendix  A  1,  a  semicolon  is  used  as  a  statement 
terminator.)  The  Correction  Phase  inserts  a  ") "  after  the 
"5".  However,  since  two  right  parenthesis  were  needed,  it 
encounters  an  error  at  the  same  point  during  tha  reparse.  A 
possible  correction  has  apparently  teen  found,  but  with  that 
correction  the  possible  parse  is  unable  to  reparse  up  to  the 
point  in  the  input  string  that  the  other  possible  parses 
have  parsed  to.  Since  all  possible  parses  must  proceed  in 
parallel,  it  must  be  discarded.  A  list  of  pointers  is  also 
kept  to  the  possible  parses  discarded  under  these 
conditions.  If  another  possible  parse  either  finds  a 
correction  and  is  able  to  reparse  further,  or  parses  further 
than  the  discarded  ones  did  before  attempting  to  reduce  over 
the  error  point,  then  these  parses  are  permanently 
discarded.  If  not,  then  these  discarded  parses  are 
re-instated   and   the  error  routine*  calls  itself  recursively 
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from  the  point  at  which  they  encountered  the  error  on   the 

attempted   reparse.    This   is   in   fact  what  happens  in  the 

above  example.   Upon  entering  the   error  routine   again,   a 

second  ")"  is  inserted  and  this  time  the  parse  is  completely 
corrected. 


U.U  Correction  Phase 


A  possible  parse  becomes  a  candidate  for  correction  as 
soon  as  it  attempts  to  reduce  over  a  ?.  This  ?  can  best  be 
looked  on  as  a  barrier  across  which  the  possible  parse  is 
not  continuous.  The  parse  is  continuous  from  its  beginninq 
up  to  the  ?1,  since  the  parser  got  that  far  before 
encountering  its  first  prror.  The  possible  parse  is 
continuous  between  ?'s  (if  there  is  more  than  dug)  and  is 
continuous  from  the  last  ?  to  the  point  where  the  attempted 
reduction  took  place.  Thus,  if  the  possible  parse  can  be 
made  continuous  across  the  ?*s,  then  it  can  be  considered  as 
corrected. 


Remember  that  the  possible  parses  wer^  created  on  the 
assumption  that  the  input  symbol  at  the  error  point  was 
correct.   Each  individual  possible  parse   is   based   on   the 
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additional  assumption  that  the  state  to  the  immediate  right 
of  the  ?  (the  state  from  which  that  possible  parse  was 
continued  after  the  error)  is  a  reasonable  choice.  Should 
either  of  these  assumptions  be  false,  the  correction  should 
not  be  possible  within  the  fixed  cost  limit,  resultinq  in 
the  possible  parse  being  discarded.  Ideally,  if  both 
assumptions  are  true,  then  the  minimal  changes  to  the  input 
string  that  are  necessary  to  correct  the  possible  parse  will 
be  found. 

The  basic  idea  is  to  find  a  way  to  get  from  the  state 
on  the  left  of  the  barrier  to  the  state  on  the  right  of  the 
barrier,  while  considering  only  insertions  and  deletions  of 
terminal  symbols.  There  is  actually  a  set  of  states  on  both 
the  left  and  the  right  of  the  barrier.  A  state  can  have 
many  ancestral  states.  For  the  scope  of  this  text, 
ancestral  states  are  defined  as  follows: 

A  state  S»  is  ancestral  to  state  S  if  given  a 
certain  state  stack  with  state  5*  on  top,  one  or 
more  "reduce  moves"  could  be  performed  which  would 
leave  state  S  on  top  of  the  state  stack. 


Since  the  error  recovery  method  will  only  be  working  with 
insertions  ani  deletions  of  terminals,  it  is  sufficient  to 
work  with  terminal-entry  ancestral  states.   For  the  scope  of 
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this   text,   terminal-entry   ancestral  states  are  defined  as 
follows: 

The  set   of   terminal-entry   ancestral   (TEA) 
states   of   state   S   is  the  set  consisting  of  all 
ancestral  states  of  state  S  which  are  enterable  on 
a  terminal  symbol  (as  a  result  of  a  shift) . 


The  s»t  of  TEA  states  to  the  immediate  left  of  the 
barrier  will  be  called  the  leftstates  and  the  set  of  TEA 
states  to  the  immediate  riqht  of  the  barrier  will  be  called 
the  riqhtstates.  The  objective  is  to  find  a  "simple  way"  to 
qet  from  one  of  the  leftstates  to  one  of  the  riqhtstates. 
This  "simple  way"  is  by  the  insertion  of  a  sinqle  terminal, 
thouqh  the  method  could  be  extended  to  consider  the 
insertion  of  multiple  terminals.  If  the  sinqle  insertion 
will  not  break  the  barrier,  it  is  assumed  that  the  error 
ocurred  at  an  earlier  point  in  the  input  string  and  an 
attempt  is  made  to  back  thp  barrier  up  one  symbol.  The 
symbol  to  be  backed  ovpt  is  the  previous  symbol  in  ths  input 
strinq,  which  is  always  known  since  the  oriqinal  input 
strinq  is  saved  in  tokenized  form.  Backinq  the  barrier  over 
this  terminal  means  backinq  up  both  the  leftstates  and  the 
riqhtstates  over  the  terminal.  Backinq  a  set  of  states  over 
a  terminal  has  the  followinq   effect.    The   set   of   states 


25 


after  the  backup  consists  of  all  states  from  which  a  shift 
(on  the  terminal)  can  be  made  to  one  of  the  states  that  was 
in  the  set  prior  to  the  backup. 

All  of  the  leftstates  which  can  be  backed  up  over  the 
terminal  are  backed  up,  and  those  that  cannot  be  backed  up 
over  the  terminal  are  discarded.  At  least  one  of  the 
leftstates  will  always  be  able  to  back  over  the  terminal, 
since  one  of  them  is  the  state  that  was  in  the  actual  parse 
before  the  error  was  detected. 

If  ono  or  more  of  the  rightstates  can  also  back  over 
the  terminal,  then  those  that  can  are  backed  up  and  the 
barrier  has  successfully  been  backed  up  one  symbol  without 
accumulating  any  cost.  The  orocess  then  repeats  itself  by 
alternately  looking  for  a  "simple  way"  (a  single  terminal 
insertion)  to  break  the  barrier  and  then  backing  up  the 
barrier.  This  continues  until  either  the  barrier  is  broken 
(the  possible  parse  is  considered  corrected)  or  a  total  cost 
is  accumulated  which  is  greater  than  the  fixed  limit. 

However,  it  is  very  likely  that  none  of  the  rightstates 
will  be  able  to  back  over  the  previous  terminal  from  the 
input  string.  Remember  that  each  possible  parse  is  based  on 
the  assumption  that  the  state  that  was  assumed  when  setting 
up  the  possible  parse  (the  original  righstate)  was  a 
reasonable   choice.    Therefore,   if  the   possible  parse  is 
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still  assumed  a  reasonable  one,  but  none  of  the  rightstates 
can  back  over  the  terminal  from  the  input  string,  then  that 
terminal  must  be  an  incorrect  one.  The  terminal  is 
considered  deleted  as  far  as  the  possible  parse  is 
concerned,  and  the  cost  of  its  deletion  is  added  to  the 
total  cost  being  accumulated  for  the  possible  parse.  Even 
though  the  rightstates  are  unsuccessful  in  backing  up, 
because  of  this  deletion,  the  barrier  itself  has  been  backed 
up.  The  process  then  continues,  provided  that  the  total 
cost  accumulated  is  still  less  than  the  fixed  limit. 


£■ 


Each  time  either  the  set  of  leftstates  or  the  set  of 
rightstates  is  backed  over  a  symbol,  that  set  of  states  is 
updated  so  that  it  only  contains  TEA  states.  In  the  case  of 
the  set  of  rightstates  (for  reasons  explained  in  the  next 
several  paragraphs),  the  set  of  states  is  saved  before  the 
update  occcurs.  The  set  of  rightstates  prior  to  the  update 
will  be  referred  to  as  the  "true"  rightstates. 


There  is  a  problem  with  nondet erminism  both  in  backing 
up  the  set  of  leftstates  and  in  backing  up  the  set  of 
rightstates. 


At  the  beginning  of  the  Correction  Phase,  the  set  of 
leftstates  consists  of  the  TEA  states  of  a  single  state, 
which  is  the  state  that  was  on  the  top  of  the  parsing  stack 
at   the   error   point.    Ideally,   while  backing  up  over  the 
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input  string,  the  leftstates  should  always  consist  of  the 
TEA  states  of  a  single  state  (the  "true"  leftstate).  That 
"true"  leftstate  should  be  the  state  that  was  on  the  top  of 
the  parsing  stack  when  the  original  forward  parse  was  at  the 
sane  point  in  the  input  string.  If  that  "true"  leftstate 
still  exists  in  the  parsing  stack,  then  it  can  be 
determined,  since  each  state  pushed  on  the  parsing  stack  is 
accompanied  by  a  pointer  into  the  tokenized  input  string. 


The  nondeterminism  in  backing  up  the  set  of  leftstates 
only  occurs  when  backing  over  input  symbols  that  have 
already  been  reduced  into  a  nonterminal.  When  this  occurs, 
the  leftstates  are  only  a  set  of  possibilities,  and  the 
"true"  leftstate  will  not  be  known  until  backing  up  into  a 
state  which  still  exists  on  the  parsing  stack.  During  this 
period  of  uncertainty  about  the  validity  of  the  set  of 
leftstates,  a  break  of  the  barrier  is  not  necessarily  a 
correct  solution.  Each  time  a  way  is  found  to  break  the 
barrier,  the  leftstate  used  is  checked  to  guarantee  that  it 
is  a  "true"  leftstate  or  that  it  can  be  backed  up  into  a 
"true"  leftstate.  If  the  leftstate  used  does  not  satisfy 
this  check,  then  the  solution  is  ignored  and  that  leftstate 
is  discarded  from  the  set  of  leftstates. 
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The  nondeterminism  encountered  in  backing  up  the  set  of 
riqhtstates  is  not  as  easy  to  control  as  that  encountered  in 
backing  up  the  set  of  leftstates.  If  not  checked,  the  set 
of  riqhtstates  can  expand  very  rapidly  and  the 
nondeterrainism  involved  could  lead  to  a  potentially 
confusinq  situation  for  the  Correction  Phase  which  attempts 
to  find  a  simple  way  to  get  from  one  of  the  leftstates  to 
one  of  the  riqhtstates. 


This  problem  appears  to  be  checked  nicely  by  always 
attemptinq  to  back  the  barrier  over  a  nonterminal  before 
attemptinq  to  back  it.  over  the  previous  terminal  in  the 
input  string.  This  can  be  done  whenever  three  conditions 
are  satisfied.  The  "true"  leftstate  must  be  known,  that 
"true"  leftstate  mast  be  enterable  on  a  nonterminal,  and  at 
least  one  of  the  "true"  rightstates  must  be  enterable  on 
that  same  nonterminal.  When  these  three  conditions  are 
satisfied,  the  "true"  leftstate  is  backed  over  this 
nonterminal  into  another  "true"  leftstate.  Those  "true" 
riqhtstates  that  can  be  backed  up  over  the  same  nonterminal 
are  backed  up  and  those  that  cannot  are  discarded.  There  is 
no  problem  in  determininq  the  point  in  the  input  strinq  to 
which  the  barrier  has  been  backed  up,  since  there  is  still  a 
"true"  leftstate  which  exists  on  the  parsinq  stack  and  which 
is  accompanied  by  a  pointer  into  the  tokenized  input  strinq. 
If  any  one  of  the  three  co  ditions  is  not  satisfied,  then  an 
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attempt  is  made  to  back  the  barrier  over  the  previous 
terminal  in  the  input  strinq  in  the  manner  previously 
described. 

In  all  the  examples  we  tested,  backing  over  a 
nonterminal  whenever  possible  prevented  the  Correction  Phase 
from  qettinq  into  trouble  due  to  the  nondetermin ism  involved 
in  backinq  up  the  rightstates.  In  some  cases  it  would  have 
run  into  trouble  had  it  only  backed  up  over  the  terminals  in 
the  input  strinq.  Though  this  method  of  controllinq  the 
nondeterminism  problem  worked  very  well,  it  is  not  clear 
that  it  is  sufficient  to  handle  any  possible  case  that  could 
be  contrived. 

There  is  an  additional  advantaqe  to  backinq  over  a 
nonterminal  whenever  possible,  and  that  is  because  it  is 
much  more  efficient  than  backing  over  the  individual 
terminals  which  have  already  reduced  into  the  nonterminal. 
For  example,  backinq  over  a  nonterminal  <statement>  is  much 
more  efficient  than  backinq  over  each  of  the  individual 
terminals  of  that  statement. 


Eventually  the  process  will  complete.  If  it  ends  by 
accumulatinq  too  qreat  a  cost,  then  it  failed,  and  as  a 
result  the  possible  parse  will  be  discarded.  If  it  ends  by 
breaking  the  barrier,  then  the  possible  parse  has  been 
corrected.   Taking  the  corrections  into   consideration,   the 
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input  string  is  reparsed  beginning  with  the  "true"  leftstate 
closest  to  the  barrier  which  has  not  seen  the  point  in  the 
input  string  where  the  earliest  correction  was  made. 

Although  only  insertions  and  deletions  of  terminals  are 
considered  at  any  point,  the  error  messages  may  suggest 
changing  one  terminal  to  another.  Each  time  an  insertion  is 
to  be  made  for  a  possible  parse,  a  check  is  made  to  see  if 
the  terminal  is  to  be  inserted  next  to  a  terminal  which  the 
Correction  Phase  has  just  deleted  for  the  same  possible 
parse.  If  so,  the  corrections  are  combined  into  one  and  the 
cost  accumulated  for  that  possible  parse  is  changed  to 
reflect  only  the  maximum  of  the  two  individual  costs. 


4.5  Special  Tables 

The  error  recovery  method  reguires  three  special  tables 
other  than  the  normal  parsing  tables  reguiced  foe  an  LR 
parser. 


1.)  The  Legal  State  Table. 

2.)  The  Predecessor  States  Table. 

3.)  The  TEA  States  Table. 
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The  Legal  State  Table  is  an  indexed  sequential  table 
which  is  indexed  by  terminals  of  the  language.  For  each 
terminal,  the  table  contains  all  states  from  which  the 
parser  action  with  that  terminal  as  an  input  symbol  is  a 
shift.  The  Lpqal  State  Table  is  used  in  settinq  up  the 
possible  parses. 

The  Predecessor  States  Table  is  really  two  indexed 
sequential  tables  which  are  indexed  by  the  states  of  the  LR 
parser.  For  each  state,  the  first  table  contains  the  symbol 
that  the  state  is  enterable  on,  and  the  second  table 
contains  all  states  from  which  the  state  could  be  entered. 
The  Predec8ssor  States  Table  is  used  in  backing  up  a  set  of 
states  over  the  input  strinq  as  well  as  in  creatinq  the  TEA 
States  Table. 

The  TEA  States  Table  is  an  indexed  sequantial  table 
which  is  indexed  by  the  states  of  the  T-R  parser.  For  each 
state,  the  table  contains  all  TEA  states.  This  table  is 
also  used  in  backinq  up  a  set  of  states  over  the  input 
strinq. 


All  three  of  these  tables  are   automatically   generated 
directly  from  the  tables  of  the  LR  parser. 
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4.6  Modifications  to  LR  Parser 
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This  error  recovery  method  requires  some  modifications 
to  the  original  LR  parser.  The  error  recovery  method 
depends  on  having  access  to  the  original  input  string  in 
tokenized  form.  Therefore  the  lexical  analysis  phase  must 
save  each  token  that  it  processes.  If  the  original  input 
string  were  ordinarily  available  and  there  were  a  desire  to 
reduce  overhead  for  the  parsing  of  completely  correct 
programs,  then  the  error  routine  could  re-tokenize  when  it 
needed  a  symbol.  However,  for  incorrect  programs,  it  would 
be  much  more  efficient  to  save  the  input  string  in  tokenized 
form. 

A  problem  is  presented  by  the  fact  that  the  error 
recovery  method  can  decile  to  back  up  and  restart  a  set  of 
possible  parses.  This  requires  the  lexical  analysis  phase 
to  check  if  the  next  token  it  is  supposed  to  process  has 
alreadv  be<=r.  consumed.  Tf  it  has  already  been  consumed, 
then  it  can  be  found  in  the  tokenized  input  string  that  was 
saved. 


Another  problem  is  found  in  trying  to  make  a 
correspondence  between  the  states  on  the  parsing  stack  and 
the  tokenized  input  string.  Suppose  the  error  recovery 
routine   makes  a  change  to  the  input  string.   The  parse  must 
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be  resumed  at  the  point  of  the  charge.  However,  there  is  a 
problem  in  determining  the  state  closest  to  the  top  of  the 
parsing  stack  which  has  not  seen  the  point  in  the  input 
string  where  the  change  was  made.  without  this 
correspondence  between  the  parsing  stack  and  the  input 
string,  the  input  string  would  have  to  be  reparsed  entirely 
after  each  correction.  Therefore,  each  state  that  is  pushed 
onto  the  stack  is  accompanied  by  a  pointer  into  the 
tokenized  input  string. 
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5.   IMPLEMENTATION  AND  SAMPLE  PROGRAMS 


The  error  method  recovery  was  implemented  with  an  LE 
parser  for  a  languaqe  whose  BNF  is  listed  in  Appendix  A. 
The  LR  parser  has  356  states.  The  implementation  is  written 
in  Pascal  [4,13]  and  was  run  a  a  DEC-10  timesharing  system. 
The  sample  programs  were  all  run  with  a  constant  cost  of  10 
for  the  insertion  or  deletion  of  any  terminal  in  the 
language.  The  cost  limit  for  each  correction  attempt  of  a 
possible  parse  was  set  at  31. 


The  remainder  of  this  section  consists  of  a  discussion 
of  the  results  from  ten  sample  programs.  Sample  program  #1 
has  a  missing  statement  terminator,  which  is  solved  by  a 
simple  insertion.  Sample  program  #2  also  has  a  missing 
statement  terminator,  but  it  is  in  such  a  context  that  the 
prror  routine  cannot  determine  which  of  two  possible 
solutions  is  the  most  likely.  In  sample  program  #3,  the 
prror  routine  provides  a  single  solution  with  the  option  of 
ins^rtirg  either  of  two  symbols  at  a  specific  point  in  the 
input  string.  Three  reasonable  solutions,  including  two  not 
so  obvious  ones,  are  provided  for  sample  program  #U.  This 
example   demonstrates   consecutive   insertion   and   deletion 
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corrections  being  combined  into  a  single  "change" 
correction.  In  sample  program  #5,  the  error  is  detected 
immediately  and  conseguently  the  possible  parses  are  set  up 
for  a  symbol  that  should  be  deleted.  The  error  routine  must 
discard  all  of  the  possible  parses,  delete  the  incorrect 
symbol,  and  set  up  another  set  of  possible  parses  before 
findinq  a  reasonable  solution-  Sample  program  #6  contains 
an  "if"  statement  that  is  missing  the  "IF",  and  the  error 
routine  correctly  inserts  it.  This  example  demonstrates  the 
advantages  of  an  unbounded  look-ahead  scheme.  Sample 
program  #7  contains  an  expression  with  three  unmatched  left 
parenthesis  and  requires  the  error  routine  tD  call  itself 
recursively  in  order  to  provide  a  solution.  In  sample 
program  #8,  the  restriction  of  not  considering  the  insertion 
of  multiple  terminals  as  a  "simple  way"  to  break  the  barrier 
prevents  the  error  routine  from  finding  an  obvious  solution, 
but  it  continues  and  finds  another  equally  acceptable 
solution.  In  sample  program  #9,  the  same  restriction 
prevents  the  error  routine  from  finding  any  reasonable 
solutions.  This  example  demonstrates  the  problems  that 
arise  when  the  error  routine  is  unable  to  provide  a 
solution.  This  is  the  only  one  of  the  ten  sample  programs 
for  which  the  error  routine  is  unable  to  provide  at  least 
one  reasonable  solu+ion.   Finally,  to  end  on  a  positive 


36 

note,  sample  program  #10  is  presented,  and   it   demonstrates 
the  error  routine  providing  several  good  solutions. 


Sample  Program  #1 

A  D  C[20  1. 
READ  A  B  C[20T 
WRITE  A  B; 
WRITE  C[ 20  ];  . 


The  configuration  of  the  parser  at  the  point   of   error 
detection  is  as  follows: 


Symbols  Implied  By.  States  On  Parsing  Stack 


IH£Ut  Symbol 


<declarat ion-list>   .    <statement-list>    READ 
<input-list>    <identifier>   f    <expression>    ]      ?1      "WRITE" 

in    line    #3 


The  error  is  detected  at  this  point  because  the  symbol 
"WRITE"  is  not  a  legal  right  context  to  make  the  the 
reduction  <subscripted-variable>  =>  <identifier>  [ 
<expression>  ].  The  posssible  parses  are  set  up  for  the 
symbol  "WRITE".  The  error  recovery  method  supplies  one 
solution. 
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The  possible  parses  which  yield  solution  #1  are  in  the 
followinq  configuration  when  they  attempt  to  reduce  over  the 
error    point. 


Symbols    Im.Elied   By   States   On   Parsing  Stack 

<f1aclaration-list>  .  <statement-lis t>  READ 
<input-list>  <identifier>  [  <expression>  ] 
?1    <statement>    ; 


IS.£Ht   Symbol 


"WRITE" 
in   line    #U 


Th»  attempted  reduction  is: 

<statement-list>      =>      <statemen  t-li  st>    <stateraer»t>    ; 

The  Correction  Phase  immediately  finds  that  the  insertion  of 
a  ";"  will  break  the  ?1  barrier  and  yields  the  solution: 


INSERT   ";"   AFTER   "  ]"   WHICH  IS  TOKEN  #7  IN  LINE  #2 
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Sample  Program  #2 

A  X  Y. 
READ  A 
X  :=  Y;  . 


The   conf iquration   of    the   parser  at    the    point      of      error 
detection    is    as    follows: 


Syjnbgls    Implied    By_   States   On    Parsing  Stack 


IHfilJt   Symbol 


<declaration-list>    .    <statement-lis t>    READ 
<input-list>    <identifier>  ?1      ":" 


v 


The  error  is  detected  at  this  point  because,  with  this  stack 
configuration,  the  symbol  ":"  is  not  a  legal  right  context 
to  make  the  reduction  <variable>  =>  <identifier>.  (Note 
that  this  is  an  LR,  not  an  SLR  [2]  parser.)  The  possible 
parses  are  set  up  for  the  symbol  ":  ".  The  error  recovery 
method  supplies  two  solutions. 


The  possible  parse  which  yields  solution  #1  is  in  the 
following  configuration  when  it  attempts  to  reduce  over  the 
error  point. 


Symbols  Implied  By_  States  On  Parsing  Stack 

<declaration-list>  .  <statement-list>  READ 
<input-list>  <identifier>  ?1  :  = 


Input  Sy_mbol 

»t  y  " 

in  line  #3 
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The    attempted   reduction    is: 
<left-part>      =>      <identifier>    :    = 

The  Correction  Phase  finds  no  "simple  way"  to  break  the 
original  ?1  barrier.  It  successfully  backs  the  barrier  over 
the  symbol  "X",  leaving  the  possible  parse  in  the  following 
configuration: 


Symbols   Implied    By_  States  On    Parsing  Stack 

<declaration-list>   .    <statement-list>    READ 
<input-list>    ?1    <identifier>   :    = 


IHEJit.   Symbol 


It  then  finds  that  the  insertion  of  a  " ;"  will  break 
the  barrier  and  yields  the  solution: 

INSERT   ";"   AFTER   "A"   WHICH  IS  TOKEN  #2  IN  LINE  #2 

The  possible  parse  which  yields  solution  #2  is  in  the 
following  configuration  when  it  attempts  to  reduce  over  the 
error  point. 


Symbols  Implied  By  States  On  Parsing  Stack 


IHEHi  Symbol 


<declarat ion-list >  .  <statement-lis t>  READ 
<input-list>  <identifier>  ?1  :  =  "Y" 

in  line  #3 


&. 


1-1 

r 


{3 
3* 
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The  attempted  reduction  is: 
<left-part>   =>   <identifier>  :  = 

The  Correction  Phase  finds  no   "simple   way"   to   break  the 

oriqinal  ?1  barrier.   Tt  successfully  backs  the  barrier  over 

the  symhol  "X".   It  then  finds  that  the  insertion  of   a  "[  " 
will  break  the  barrier  and  yields  the  solution: 

INSEPT   "f "   AFTEP   "A"   WHICH  IS  TOKEN  #2  IN  LINE  #2 


Both  solutions  have  the  same  ccst  attributed  to  them 
and  the  error  recovery  method  arbitrarily  chooses  solution 
#1.  For  *his  sample  program,  the  error  recovery  method 
provides  one  correct  solution  and  one  incorrect  solution. 
Sinrro  both  solutions  have  the  same  cost  attributed  to  them, 
it  chooses  the  first  one,  which  happens  to  bs  the  correct 
solution.  This  is  a  casp  where  the  error  recovery  method 
almost  q°ts  fooled  by  a  language  construct  being  legal  in 
morp  than  on?  context.  According  to  the  language 
description  f  Appendix-  A  "|,  an  assignment  statement  can  appear 
as  the  subscript  of  an  array.  The  second  solution  is  based 
on  the  assumption  that  the  "X:=Y"  is  a  subscript,  and  that 
the  "rpal"  statement  was  intended  to  b«=  "PFAD  Arx:=Yl".  If 
ther°  is  a  " ]"  between  the  "Y"  and  the  ";",  then  solution  #2 
is  the  most  likely.  If  no* ,  then  solution  #1  is  the  most 
likely.   The  problem  is  that  the  error  recovery  rou^in?  must 
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make  a  decision  without  knowinq  what  symbols  ar<=  to  the 
riqht  of  the  "Y".  This  is  a  case  where  our  method  of 
parsinq  until  an  attempt  is  made  to  reduce  over  the  error 
point  does  not  provide  sufficient  look-ahead  to  decide 
between  the  possibilities.  Luckily,  it  chooses  the  correct 
solution.  If  it  chose  the  incorrect  one,  it  would  insert 
the  "[ "  after  the  WA",  return  to  the  reqular  parser, 
continue  the  parse  until  discoverinq  that  the  correspondinq 
"1"  was  missinq,  and  call  the  error  routine  aqain,  at  which 
point  the  " ]"  would  be  inserted.  If  the  oriqinal  proqrara 
was 


A  X  Y. 

READ  A  X:=Y];  . 


then  the  error  routinp  would  insert  a  H ; "  after  the  "A"  in 
line  #2  and  return  to  the  reqular  parser.  The  reqular 
parser  would  encounter  another  error  and  call  the  error 
routine  with  the  "1"  as  the  input  symbol.  The  error  routine 
would  set  up  a  sot  of  possible  parses  for  the  symbol  "  ]", 
and  proceed  to  find  the  solution  of  "chanqinq"  the  ";"  (that 
it  just  inserted)  to  a  "[". 


f: 


S3 

".J3 

\l 

i 

Sc 

J" 

S 

B 
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S§JE£le    Program    *i 

A[  20]    B    X. 

X    :=    Ar    B:=     ]; 

WRITE    X;    . 


The    configuration    of    the    parser  at    the   point      of      prror 
detection    is    as    follows: 


Symbols    Implied    By.   States   On    Parsing  Stack 


IfiEUt    Symbol 


<d^clarat ion-list>    .    <statement-lis t> 

<left-part>    <identifier>   [    <identifier>    :    =    ?1      "]" 

in    line    #2 


The  error  is  detected  at  this  point  because  the  symbol  "  "j" 
is  not  a  leqal  riqht  context  to  make  the  reduction 
<left-part>  =>  <identifier>  :  =.  Th<=>  possible  parses  are 
set  up  for  the  symbol  "  "]".  Thp  error  recovery  method 
supplies  one  solution  that  contains  an  option  of  two 
symbols. 


U3 

The  possible  parses  which  yield  solution  #1  are  in  the 
followinq  configuration  when  they  attempt  to  reduce  over  the 
error    point. 


Symbols    Implied    By.  States  On   Parsing  Stack 

<declarat ion-list>   .    <statement-list> 
<left-part>   <identifier>   [    <identifier>   :    - 
?1    1 


IHElit   Symbol 


* 


in   line    #2 


The  attempted  reduction  is: 

<subscripted-var>  =  >   <identifier>  [  <assignment>  ] 

The  Correction  Phase  immediately  finds  that  the  insertion  of 
either  an  identifier  or  a  string  of  digits  will  break  the  ?1 
barrier  and  yields  the  solution: 


INSERT   "identifier"   or   "digits"   AFTER   "="   HHICH  IS 
TOKEN  #8  IN  LINE  #2. 


:'.-\--"r- 


0 
Si 


U4 


3 

C3 

i: 
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g 

'0 
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B 

52 


A  B. 
TOGO  A; 
READ  B;  . 


The  configuration  of  the  parser  at  the  point   of   »rror 
detection  is  as  follows: 


Symbols  Imfilied  By  States  On  Parsing  Stack      IHEUt  Symbol 

?1 


<de~larat ion-list>  .  <statement-lis t> 
<identifier> 


"A" 

in  line  #2 


The  error  is  detected  at   this   point   because  a   statement 

cannot  start  with  two  consecutive  identifiers.  The  possible 

parsps  are   set   up   for   fhe   identifier   "A".  Thp   error 
recovery  method  supplies  three  solutions. 

The  possible  parse  which  yields  solution  #1  is   in   the 
following   configuration  when  it  attempts  to  reduce  over  the 

error  point. 


Symbols    Tmp_Iifid    By   States    On    Parsing  Stack 

<dolclaration-list>    .    <statement-li st> 
<identifier>    ?1    <id«=ntif  ier> 


InEiAi   Symbol 

ii  ■  ii 

« 

in    line    #2 


ThQ  attempted  reduction  is: 
<statemont>   =>   GOT^  <identifier> 
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The  Correction  Phase  finds  no  "simple  way"  to  break  the  ?1 
barrier.  It  attempts  to  back  the  barrier  over  the  symbol 
"TOGO",  but  to  do  so  must  delete  the  "TOGO".  It  then  finds 
that  the  insertion  of  the  keyword  "GOTO"  will  break  the 
barrier.  The  consecutive  insertion  and  deletion  are 
combined  into  a  change  and  the  solution  emitted  is: 

CHANGE   "TOGO"   WHICH  IS  TOKEN  #1  IN  LINE  #2  TO   "GOTO" 


The  possible  parse  which  yields  solution  #2  is  in  the 
followinq  configuration  when  it  attempts  to  reduce  over  the 
error  point. 


Symbols  Implied  By_  States  On  Parsing  Stack 

<de3laration-list>  .  <statement-lis t> 
<identifier>  ?1  <variable> 


IHEHi  Sy_mbol 


in  line  #2 


The  attempted  reduction  is: 

<input-list>  =>   <input-list>  <variable> 

The  Correction  Phase  finds  no   "simple   way"   to   break  the 

oriqinal   barrier.    It   successfully  backs  the  barrier  over 

the  symbol  "TOGO".   It  then  finds  that  the  insertion  of  the 

keyword   "BEAD"   will   break   th*   barrier  and   yields  the 
solut  ion: 


INSERT   "READ"   AETER   "."   WHICH  IS  TOKEN  #3  IN  LINE  #1 
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The  possible  parse  which  yields  solution  #3  is  in  the 
followina  configuration  when  it  attempts  to  reduce  over  the 
°rror    point. 


Sy_rabgls    Implied    Ry_   states   On    Parsing  Stack 

<declaration-list>   .    <statement-lis t> 
<ident i f ier>    ?1    <variable> 


lDI>yt    Syjnbol 
in    line    #2 


dfe 


The  attempted  reduction  is: 

<output-list >   =>   <out put-list>  <variable> 


The  Correction  Phase  finds  no  "simple  way"  to  break  the  ?1 
barrier.  It  successfully  backs  the  barrier  over  th3  symbol 
"TOGO".  It  then  finds  that  th<=>  insertion  of  the  keyword 
"WRITE"  will  break  «-he  barrier  and  yields  the  solution: 

INSFPT   "WRITE"   AFTEP   "."   WHICH  IS  TOKEN  #3  IV  LINE  #1 


Solution  *1  makes  an  insertion  and  a  deletion,  bat  *  hey 
are  combined  into  a  sinale  change  at  the  cost  of  the  maximum 
of  the  costs  of  the  two  individual  corrections.  Since  all 
of  the  sample  programs  were  run  with  constant  insertion  and 
deletion  costs  for  every  terminal,  all  three  possible 
solutions  have  the  same  cost  attributed  to  them.  The  error 
recovery  method  arbitrarily  chooses  solution  #1.   Note   that 
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the  solutions  provided  have  nothinq  to  do  with  the 
similarity  between  the  spelling  of  the  symbols  "TOGO"  and 
"GOTO".  The  same  solutions  will  be  provided  if  "TOGO"  is 
spelled  "WXYZ"  or  "PEED". 


SamEle  Program  #5 

X. 

X  ;=  2; 

WRITE  X;  . 


The  configuration  of  the  parser  at  the  point   of   error 
detection  is  as  follows: 


Symbols  Implied  R£  States  On  Parsing  Stack 

<declaration-list>  .  <statement-lis t> 
<identif ier> 


IHEUt  Symbol 


?1   ";"  token  #2 
in   line  #2 


The  error  is  detected  at  this  point  because  a  statement 
cannot  start  with  an  identifier  followed  by  a  ";".  The 
possible  parses  are  set  up  for  the  symbol  ";".  However,  all 
of  the  possible  parses  encounter  an  error  on  the  very  next 
symbol  ("=").  The  ";"  is  deleted,  the  possible  parses  are 
set  up  acrain,  and  the  error  routine  is  restarted.  The  error 
recovery  method  supplies  one  solution. 
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The  possible  parse  which  yields  solution  #1  is  in  the 
following  configuration  when  it  attempts  to  reduce  over  the 
error    point. 


Symbols    Implied    By   States   On    Parsing  Stack 

<declarat iop-list>    .    <statement-list> 
<identifier>    ?1    = 


IHEUt   Symbol 


&  - 


0 

i 

fa 


Th^    att^irpted    reduction    is: 
<left-part>      =>      <identif ier>    :    = 

The   Correction    Phase    immediately   finds    that    the    insertion    of 
an    ":"    will    break    the   ?1    barrier   and   yields   the    solution: 


DELETE       ";"       WHICH    IS    TOKEN    #2    IN    LINE    *2 

INSFRT       ":"       AFTEP       "X"       WHICH    IS    TOKEN    #1     IN    LINE    #2 


Ir  this  sample  program,  the  consecutive  insertion  and 
deletion  are  net  combined  into  a  single  change.  This  is 
because  the  "DELETE  ;'»  message  originates  from  the 
discarding  of  the  first  set  of  possible  parses  and 
conseguently  its  cost  is  not  represented  in  the  total  cost 
of  the  Dossible  parse  tor  which  the  final  correction  is 
found . 


astfRS 
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Sa»£le   Program    16 

x  y  z. 

X=Y  THEN  Z:  =  0  ELSE  Z:=Z*1; 
WRITE  Z;  . 


The  configuration  of  the  parser  at  the  point   of   error 
detection  is  as  follows: 


Symbols  Implied  By  States  On  Parsing  Stack 

<declaration-list>  .  <statement-lis t> 
<identif ier> 


InEUt  Syjnbol 


?1   "="  token  #2 
in   line  #2 


The  error  is  detected  at   this   point   because  a   statement 

cannot   start   with   an   identifier  followed  by  an  "=".   The 

possible  parses  are  set  up  for  the  symbol   »•  =  ".  The   error 
recovery  method  supplies  one  solution. 

The  possible  parse  which  yields  solution  #1  is  in  the 
following  configuration  when  it  attempts  to  reduce  over  the 
error  point. 


Symbols  Implied  By.  States  On  Parsing  Stack 

<declarat ion-list>  .  <statement-lis t> 
<identifier>  ?1  <rela tional-op>  <expression> 


IHEUt  Symbol 


"THEN" 


The    attempted    reduction    is: 

<boolean-expr>   =>      <expression>    <re  lational-op>    <expression> 
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The  Correction  Phase  finds  no  "simple  way"  to  break  the  ?1 
barrier.  It  successfully  backs  the  barrier  over  the  symbol 
"X".  it  then  finds  that  the  insertion  of  the  keyword  "IF" 
will  break  the  barrier  and  yields  the  solution: 

INSERT   "IF"   AFTER   ". "   WHICH  IS  TOKEN  #4  IN  LINE  #1 


p. 


0 

'11 


! 
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This  sample  proqram  is  an  example  of  why  unbounded 
look-ahead  is  so  important  and  why  this  method  continues 
parallel  Darsinq  until  every  possible  parse  has  either  been 
discarded  or  attempted  to  reduce  over  the  error  point, 
reqardless  of  how  many  possible  solutions  have  been  found. 
Tn  fhis  case,  +  here  is  another  possible  oarse  which  thinks 
it  is  parsinq  an  assiqnment.  statement.  it  attempts  to 
reduce  over  the  error  point  (usinq  the  production 
<left-part>  =>  <ider.  ti  fier>  :  =)  immediately  after  shifting 
the  •«=«,  ar.d  finds  the  first  solution  of  insertinq  a  ":" 
after  the  "X"  in  line  #2.  Reanwhile  the  possible  parse 
which  eventually  provides  the  correct  solution  thinks  it  is 
parsina  an  "if"  statement,  but.  has  not  yet  attempted  to 
reduce  over  the  error  point.  If  the  statement  was  intended 
+c  be  an  assiqnment  statement,  then  the  first  solution  is 
correct.  If  the  statement  was  intended  to  be  an  "if" 
statement,  as  is  apparently  the  case  here,  then  the  first 
solution   is  wrorq.   There  is  no  way  of  knowinq  which  is  the 
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case,  until  the  symbol  immediately  after  the  expression  to 
+he  right  of  the  "="  is  known.  In  this  case  it  is  a  "THEN" 
and  the  first  solution  is  wrong.  Since  another  possible 
parse  is  still  parsing,  the  possible  parse  of  the  first 
solution  must  continue  also.  Dpon  seeing  the  "THEN",  the 
possible  parse  of  the  first  solution  encounters  an  error  and 
is  discarded.  At  this  point,  the  single  remaining  possible 
parse  attempts  to  reduce  over  the  error  and  provides  the 
final  solution. 


The  importance  of  unbounded  look-ahead  is  demonstrated 
by  the  fact  that  the  expression  to  the  right  of  the  "="  (in 
this  case  a  single  "Y")  can  be  of  any  length.  If  the 
look-ahead  is  bounded,  and  the  length  of  the  expression  is 
qreater  than  the  bound,  then  the  symbol  following  the 
expression  will  not  be  known  and  no  error  recovery  method 
can  determine  which  statement  was  most  likely  intended  by 
the  programmer. 
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Saragle   ££22£§.l   12 

X    Y    Z. 

X    :=     ((    Y    +     (   Z*5; 

WFITE    X;    . 


The  configuration  of  the  parser  at  the  point   of   error 
detection  is  as  follows: 


Symbols  Implied  By_  States  On  Parsing  Stack 


IU£lit  Symbol 
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J 
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<declarat ion-list >    .    <statement-lis t> 

<left-part>     (    {    <expression>    +     (    <term>    * 

<<Ugits>  ?1      ";" 

in    line    #2 


The  error  is  detected  at  this  point  because,  with  vhi3  stack 
configuration,  the  symbol  ";"  is  not  a  legal  right  context 
+o  make  the  reduction  <primary>  =>  <digits>.  The  possible 
parses  are  set  up  for  the  symbol  ";•'.  The  error  recovery 
method    supplies    one    solution. 

The  possible  parse  which  yields  solution  *1  is  in  the 
followinq  configuration  when  it  attempts  to  reduce  over  the 
error    noint. 


Symbols    Imp_l.i.e_i   By_   States   On    Parsing  Stack 

<3eclarat ion-list>    .    <statement-lis t> 
<lf»ft-part>    (    (    <expression>    ♦    (    <term>    * 
<digits>    ?1     ; 


IlLDlit   Symbol 


••WRITE" 
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The  attempted  reduction  is: 

<statement-list>   =>   <statement-li st>  <stateraent>  ; 


The  Correction  Phase  finds  that  the  insertion  of  a  ")  "  will 
break  the  barrier.  In  this  case  though,  there  are  three 
unmatched  left  parenthesis  and  insertinq  a  single  right 
parenthesis  does  not  provide  a  correct  solution.  This  is  a 
case  where  the  Correction  Phase  is  temporarily  fooled.  The 
Correction  Phase  inserts  the  ") "  and  attempts  to  reparse 
back  up  to  the  symbol  which  was  the  input  symbol  when  the 
attempt  was  made  to  reduce  over  the  error  point.  However, 
the  possible  parse  encounters  an  error  before  reparsing  to 
that  symbol  (the  ";"  in  line  #2).  The  possible  parse  is 
temporarily  discarded.  However,  no  other  possible  parse 
either  "shifts"  on  the  ";",  or  attempts  to  reduce  over  the 
error  point  and  provides  a  solution  which  allows  reparsing 
any  further  than  the  discarded  one  did.  Therefore,  the 
possible  parses  discarded  in  this  way  (two  in  this  example) 
are  r^-instated  and  the  error  routine  is  called  recursively 
from  the  point  at  which  they  encountered  the  error  during 
the  reparse  attempt. 

The  process  then  repeats  itself,  with  the  Correction 
Phase  breaking  the  barrier  by  inserting  a  ")",  but  then 
encountering  an  error  on  the  reparse.  Aaain  the  error 
routine   is   called   recursively,   and   again  the  Correction 
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Phase  breaks  the  barrier  by  inserting  a  ")".  This  tine  the 
reparse  finally  succeeds  and  the  error  routine  yields  the 
solat  ion: 


INSERT  ") "  AFTER  "5"  WHICH  IS  TOKEN  #11  IN  LINE  #2 
INSERT  ")  •»  AFTER  "5"  WHICH  IS  TOKEN  #11  IN  LINE  #2 
INSEPT   ")  '•   AFTER   "5"   WHICH  IS  TOKEN  #11  IN  LINE  #2 


This  sample  proqram  demonstrates  that  the  error 
recovery  method  can  handle  any  number  of  unmatched  left 
parenthesis,  as  lonq  as  a  restriction  is  not  placed  on  the 
number  of  levels  of  recursion  allowed. 


V 


This  error  recovery  method  can  also  handle  any  number 
of  unmatched  riqht  parenthesis.  This  is  much  less 
complicated,  since  it  does  not.  require  any  recursion.  Each 
unmatched  riqht  parenthesis  is  encountered  one  at  a  time, 
and  as  each  one  is  encountered,  the  corresponding  left 
parenthesis  is  inserted. 
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S§.IEl§  Program  #8 

A  X. 
BEGIN 
READ  A; 
X  :=  2*A; 

WRITE  X;  . 


The  configuration  of  the  parser  at  the  point   of   error 
detection  is  as  follows: 


Symbols.  Implied  Ql   states  On  Parsing  Stack 

<daclara tion-list>  .  <statement-lis t> 
BEGIN  <statement-list>  <statement>  ; 


IHEJi*  Symbol 


?1 


The  error  is  detected  at  this  point  because,  with  this  stack 
configuration,  the  symbol  "."  is  not  a  legal  right  context 
to  make  the  reduction  <statement-li st>  =>  <statement-list> 
<statement>  ;.  The  possible  parses  are  set  up  for  the 
symbol  ".".  The  error  recovery  method  supplies  one 
solution. 

The  possible  parse  which  yields  solution  #1  is  in  the 
following  configuration  when  it  attempts  to  reduce  over  the 
error  point. 


Syjnbols    Implied    Bv.   States   On    Parsing  Stack 

<declarat ion-list>    .    <st atement-lis t> 
BEGIN    <statement-list>    <statement>    ;    ?1    . 


IHEUt    Syjnbol 


end-of-f ile 
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The  attempted  reduction  is: 

<program>   =>   <declaration-list>  .  <statement-list>  . 

The  Correction  Phase  finds  no  "simple  way"  to  break  the  ?1 
barrier.  It  then  successfully  backs  the  barrier  over  the 
";"  in  line  #5.  It  aqain  finds  no  "simple  way"  to  break  the 
barrier.  This  time  it  finds  that  the  "true"  leftstate  and 
one  of  the  "true"  rightstates  are  both  enterable  on  the 
nonterminal  <statement>.  The  barrier  is  successfully  backed 
over  the  nonterminal  <statement>  (which  originally  consisted 
of  "WRITE  X").  There  is  still  no  "simple  way"  to  break  the 
barrier,  and  the  barrier  is  again  backed  up.  This  time  the 
"true"  leftstate  and  one  of  the  "true"  rightstates  are  both 
»nterabl<=>  on  the  nonterminal  <state ment-list>.  The  barrier 
is  successfully  backed  over  the  nonterminal  <statement- list> 
{which  originally  consisted  of  "READ  A;  X  :=  2*A;").  Once 
again  there  is  no  "simple  way"  to  break  the  barrier.  It 
then  attempts  to  back  the  barrier  over  the  symbol  "BEGIN", 
but  to  do  so  must  delete  the  "BEGIN".  At  this  point,  the 
Correction  Phase  realizes  that  deleting  the  "BEGIN"  has 
broken  the  barrier,  and  it  yields  the  solution: 


DELETE   "BEGIN"   WHICH  IS  TOKEN  #1  IN  LINE  #2 
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The  reason  the  Correction  Phase  does  not  insert  an 
"END"  instead  of  deleting  the  "BEGIN"  is  because,  according 
to  the  syntax  of  the  language  [Appendix  A],  it  needs  to 
insert  "FIND  ;".  Since  the  method  was  implemented  so  that  it 
considers  a  "simple  way"  to  break  the  barrier  as  being  the 
insertion  of  a  single  terminal,  it  cannot  insert  "END  ;"  and 
must  continue  backing  up  until  it  deletes  the  "BEGIN". 


Sa5£l§  EE9.9.0I5  19. 

X. 

X  <  2*3;  • 


The   configuration   of      the      parser      at      the      point      of      error 
detection    is    as    follows: 

Symbols    l!Eli§!   By_   States   On    Parsing  Stack  l5.£Ut   Sy_mbol 

<declaration-list>    .    <statement-lis t> 

<identifier>  ?1      "<" 


The  error  is  detected  at  this  point  because  a  statement 
cannot  start  with  an  identifier  followed  by  the  symbol  "<". 
The  possible  parses  are  set  up  for  the  symbol  "<".  The 
error    recovery    method    does   not   find    any   solutions. 
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There  is  originally  only  one  possible  parse,  since 
there  is  only  one  state  from  which  it  is  legal  to  "shift"  on 
the  input  symbol  "<".  That  possible  parse  believes  it  is 
parsina  the  Boolean  expression  in  an  "if"  statement,  and 
encounters  an  error  with  the  ";"  being  an  illegal  right 
context  to  make  the  reduction  <primary>  =>  <digits>.  At 
this  point  the  possible  parse  is  in  the  following 
configuration. 


Symbols  Implied  By  States  On  Parsing  Stack 

<declarat ion-list>  .  <statement-lis t> 
<identifier>  ?1  <rela tional-op>  <term>  * 
<digits> 


lB£!it  Symbol 


?2 


n  .  » 


At  this  point,  the  error  routine  calls  itself  recursively 
and  a  new  set  of  possible  parses  is  set  up  for  the  symbol 
";".  All  of  thp  possible  parses  attempt  to  reduce  over  the 
error  point  immediately  after  "shifting"  the  ";".  None  can 
successfully  break  the  barrier,  and  the  ";"  is  deleted.  The 
error  recovery  routine  tri°s  again,  this  time  setting  up  the 
possible  parses  for  the  input  symbol  ".".  All  of  the 
possible  parses  attempt  to  reduce  over  the  error  point 
immediately  after  shifting  the  ".".  Again,  nona  can 
successfully  break  the  barrier,  and  the  ". "  is  deleted.  At 
this  point,  the  error  recovery  routine  realizes  that  it 
would  be  pretty  silly  to  set  up  possible  parses  for  the 
end-of-file  condition,  so  it   admits  defeat   and   indicates 
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that  it  has  found  zero  solutions. 

If  the  method  is  implemented  to  consider  a  "simple  way" 
to  break  the  barrier  as  beinq  the  insertion  of  two  terminals 
instead  of  only  one,  then  a  solution  will  be  found.  While 
the  possible  parses  are  set  up  for  the  ";",  the  Correction 
Phase  will  discover  the  solution  of  deleting  the  "<"  and 
inserting  a  ":"  followed  by  an  "=»„ 

The  reason  this  sample  program  is  presented  here  is  to 
demonstrate  what  happens  when  the  error  recovery  method 
cannot  find  a  solution.  It  should  be  expected  that  the 
error  recovery  method  will  not  always  be  able  to  provide  a 
solution.  If  the  method  allows  the  insertion  of  N 
terminals,  there  could  always  be  a  case  reguiring  the 
insertion  of  N+1  terminals. 


The  problem  with  the  method,  as  it  is  described  in 
Section  4,  is  that  it  does  not  know  when  to  "give  up".  In 
this  sample  program,  there  are  no  mora  statements  following 
the  erroneous  one.  However,  if  there  are  additional 
statements,  the  error  routine  will  parse  through  them  all, 
still  attempting  to  correct  the  first  error.  Suppose  the 
erroneous  statement  is  followed  by  several  completely 
correct  statements,  each  correctly  terminated  by  a  ";".  For 
each  one,  the  error  routine  sets  up  possible  parses  for  the 
first  symbol  of  the  statement,  parses  the  statement  until  it 
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reduces  into  <statement>,  "shifts"  the  terminating  ";N,  and 
attempts  the  reduction  <statement- list>  =>  <statenent- list> 
<statement>  ;.  For  all  of  the  possible  parses,  this 
reduction  means  reducing  over  the  first  error  point.  The 
Correction  Phase  fails  as  it  did  previously  and  conseguently 
the  innut  symbol  which  was  assumed  correct  in  setting  up  the 
possible  parses  is  deleted.  (The  symbol  delated  is  the 
firs*  symbol  of  the  correct  statement.)  The  possible  parses 
are  then  set  up  for  the  next  input  symbol  (the  second  symbol 
of  the  correct  statement)  and  the  process  continues. 


I 


In  summary,  each  correct  statement  is  parsed  correctly, 
but  is  unable  to  reduce  into  <stateraent-list>  because  of  the 
previous  erroneous  statement  which  th^  Correction  Phase 
could  not  find  a  solution  for.  Each  correct  statement  is 
then  deleted  symbol  by  symbol.  Occasionally,  the  remaining 
portion  of  a  statement  will  resemble  another  language 
construe*  and  the  error  routine  will  call  itself  recursively 
before  resuming  its  pattern  of  delating  symbols. 


Clearly,  these  spurious  error  messages  throughout  the 
remainder  of  the  program  are  not  acceptable.  What  is  needed 
is  a  way  for  the  error  routine  to  know  when  a  symbol  cannot 
be  found,  so  that  it  can  enter  "panic  mode"  and  "clean  up" 
the  stack  before  continuing.  This  error  recovery  method 
should   be  implemented  with  a  "panic  mode"  which  keys  on  one 
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or  more  special  symbols  of  the  lanquage.  For  this  language 
[Appendix  A],  a  sinqle  special  symbol  of  " ; "  should  be 
sufficient.  The  "panic  mode"  should  be  entered  whenever  the 
error  routine  deletes  a  ";"  as  a  result  of  discarding  all 
possible  parses  that  were  originally  set  up  for  that  ";". 
This  is  preferable  to  entering  "panic  mode"  each  time  a  ";" 
is  deleted. 


Sa§Eie  Program  jl^Q 

A  B  X  Y. 
X  :=  Y: 

A  :=  B; 
WRITS  X  A;  . 


The  configuration  of  the  parser  at  the  point   of   error 
detection  is  as  follows: 


Symbols  Implied  By  States  On  Parsing  Stack 

<declarat ion-list>  .  <statement-list> 
<left-part>  <identifier>  : 


lUfiMt  Symbol 


?1   "A" 

in  line  #3 


The  error  is  defected  at  this  point  because  "X:=Y:"  cannot 
be  followed  by  any  symbol  but  an  "=".  The  possible  parses 
are  set  up  for  the  identifier  "A".  The  error  recovery 
method  yields  three  solutions. 
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The  possible  parse  which  yields  solution  #1  is  in  the 
following  configuration  when  it  attempts  to  reduce  over  the 
error   point. 


Syjnbols    Implied    bj   States   On    Parsincj  Stack 


iBBiJt   Syjnbol 


<declarat  ior  -list>    .    <stateraen  t-list> 

<lpft-part>    <identifier>    :    ?1    <stateioent>  ";" 

in   line    #3 


The  attempted  reduction  is: 

<statement>   =>   <identifier>  :  <statement> 

The  Correction  Phase  finds  no  "simple  way"  to  break  the  ?1 
barrier.  It  then  successfully  backs  the  barrier  over  tha 
":"  (token  #5  in  line  #2).  It  again  finds  no  "simple  way" 
to  break  the  barrier  and  it  successfully  backs  the  barrier 
over  the  "Y"  (token  #U  in  line  #2).  There  is  still  no 
simple  way  to  break  the  barrier.  It  attempts  to  back  the 
barrier  over  the  "="  (token  #3  in  line  #2) ,  but  to  do  so 
must  delete  that  "=".  At  this  point,  the  Correction  Phase 
realizes  that  deleting  the  "="  has  broken  the  barrier,  and 
it  yields  the  solution: 

DELETE   "  =  "   WHICH  IS  TOKEN  #3  IN  LINE  #2 


m 
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The  possible  parsn  which  yields  solution  #2  is  in  the 
following  configuration  when  it  attempts  to  reduce  over  the 
error   point. 


Symbols    Implied    By   States   On   Parsing  Stack 

<deelaration-list>   .    <statement-list> 
<left-part>    <identifier>    :    ?1    <assignment> 


lQ.£Ht   Symbol 


ii .  « 


in    line    #3 


The  attempted  reduction  is: 

<assignment>   =>   <left-part>  <assiqnment> 

The  Correction  Phase  immediately  finds  that  the  insertion  of 
an  "="  will  break  the  ?1  barrier  and  yields  the  solution: 

INSERT   "="   AFTER   »• : "   WHICH  IS  TOKEN  #5  IN  LINE  #2 


The  possible  parse  which  yields  solution  #3  is  in  the 
following  configuration  when  it  attempts  to  reduce  over  the 
error  point. 


Symbols  Implied  By  States  On  Parsing  Stack 

<declarat ion-list >    .    <statement-lis t> 
<left-part>    <identifier>    :    ?1    <statement>    ; 


IHEUl  Symbol 


"WRITE" 
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The  attempted  reduction  is: 

<statement-list>  =>  <statement-list >  <statement>  ; 
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The  Correction  Phase  finds  no  "simple  way"  to  break  the  ?1 
barrier.  It  attempts  to  back  the  barrier  over  the  symbol 
":"  (token  #5  in  line  #2),  but  to  do  so  must  delete  that 
":".  It  then  finds  that  the  insertion  of  a  ";"  will  break 
the  barrier.  The  consecutive  insertion  and  deletion  are 
combined  into  a  chanqe  and  the  solution  emitted  is: 

CHANGE   ":"   WHICH  IS  TOKEN  #5  IN  LINE  #2  TO   ";" 


All  three  possible  solutions  have  the  sane  cost 
attributed  to  them  and  the  error  recovery  routine 
arbitrarily  chooses  solution  #1. 


■■•>:•■ 
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6.   EVALUATION  OF  THE  METHOD 


6. 1  Effectiveness 


This  recovery  method  meets  all  of  the  criteria 
necessary  for  an  effective  error  recovery  method.  Upon 
correcting  an  error,  it  resumes  parsing  smoothly  and  does 
no*  skip  over  any  of  the  input  string.  If  a  second  error  is 
encountered  while  attempting  to  correct  the  original  error, 
the  error  routine  calls  itself  recursively  and  independently 
corrects  the  second  error  before  resuming  the  correction 
attempt  on  the  first  error.  This  allows  the  error  recovery 
method  to  handle  several  errors  in  close  proximity.  The 
recovery  method  has  the  ability  to  make  corrections  at  any 
point  in  the  input  string,  even  if  the  corrections  reguire 
modifying  the  parsing  stack.  It  uses  an  unbounded 
look-ahead  scheme.  The  advantages  of  an  unbounded 
look-ahead  scheme  are  demonstrated  in  Sample  Program  #6. 
The  recovery  method  supplies  extremely  helpful  error 
messages.  The  messages  explicitly  state  how  the  program 
should  be  changed  in  order  to  make  it  syntactically  correct. 
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A  disadvantaqe  is  that  the  method,  as  implemented ,  only 
considers  single  insertions.  There  are  cases  where  this  is 
insufficient.  Of  course,  the  method  could  be  extended  to 
consider  multiple  insertions,  but  this  would  be  less 
effecient  time-wise  and  it  is  not  clear  that  It  would  always 
be  more  effective  overall.  For  instance,  the  Correction 
Phase  miqht  provide  a  solution  by  insertinq  two  terminals 
and  then  stop  short  of  findinq  a  less  costly  solution  at  an 
earlier  point  in  the  input  strinq.  Beqardless  of  how  many 
insertions  are  allowed  in  a  correction  attempt,  there  will 
be  times  when  the  error  routine  will  fail  to  find  a 
solution.  When  this  occurs,  the  error  routine  must  enter 
"panic  mode"  and  "clean  up"  the  stack  in  order  to  resume  the 
parse  smoothly.  This  is  explained  in  more  detail  in  the 
discussion  followinq  Sample  Proqram  #9  in  Section  5. 

This  °rror  recovery  method  can  produce  some  unexpected 
results.  For  instance,  it  miqht  provide  a  sinqle  solution 
when  th^re  are  several  equally  obvious  solutions  that  it  did 
not  find.  At  other  times,  it  will  provide  all  reasonable 
solutions,  includinq  some  that  are  not  so  obvious.  This  is 
because  the  Correction  Phase  returns  the  first  acceptable 
solution  that  it  finds  for  each  possible  parse.  Multiple 
solutions  only  occur  when  more  than  one  possible  parse 
yields  an  acceptable  solution.   Overall,  this  error  recovery 
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method  is  very  effective.   It  usually  provides  at  least  one, 
and  often  provides  several  reasonable  solutions. 


6.2  Time  Requirements 


The  following  are  rough  estimates  of  the  actual  DEC-10 
CPU  time  required  to  execute  some  of  the  sample  programs. 
They  must  be  considered  as  rough  estimates  only,  since  the 
CPU  time  recorded  varied  up  to  10%  for  identical  sample 
programs  running  in  the  same  amount  of  core. 

The  LR  parser  itself  has  an  overhead  of  approximately 
4.75  seconds,  which  is  due  to  it  reading  in  the  parsing 
tables.  The  additional  time  reguired  for  the  LR  parser  to 
parse  correct  versions  of  the  sample  programs  is  negligible 
relative  to  the  parser  overhead.  For  the  first  error 
encountered,  the  error  recovery  routine  has  an  overhead  of 
approximately  5.5  seconds.  This  is  due  to  it  reading  in  the 
special  tables  that  the  error  routine  needs. 

The  time  to  correct,  each  error  varies  depending  on  the 
amount  of  core  in  which  the  program  is  executing.  The 
sample  programs  were  always  executed  in  80k  of  virtual 
memory.    The   DEC-10   CPU  time   was   recorded   both   while 
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executinq  the  programs  in  30K  of  real  memory  and  in   60K   of 
real  memory. 


The  number  of  possible  parses  that  are  set  up  is  a  biq 
factor  in  the  time  required  to  correct  an  error.  Even 
thouqh  most  possible  parses  are  discarded  fairly  quickly, 
iust  settinq  them  up  takes  a  significant  amount  of  time. 
Another  biq  factor  is  the  number  of  possible  parses  which 
are  discarded  before  atteraptinq  to  reduce  over  the  error 
noint  and  consequently  never  enter  the  Correction  Phase. 
While  executinq  in  10K  of  real  memory,  the  missinq  ";"  in 
sample  proqram  #1  takes  2.8  seconds  to  correct.  (8  possible 
parses  are  set  up  for  the  input  symbol  "WRITE".)  Whf»n  the 
";"  is  missinq  from  the  last  statement  instead  of  the  first 
statement,  it  only  takes  2.3  seconds  to  correct.  (9 
possible  parses  are  set  up  for  the  input  symbol  ".".)  Thouqh 
the  number  of  possible  parses  that  are  set  up  is 
approximately  the  same,  the  latter  correction  is  quicker, 
since  only  one  possible  parse  (as  opposed  to  three  for  the 
former  correction)  attempts  to  reduce  over  the  error  point. 
In  sample  proqram  #U,  the  error  is  detected  with  an 
identifier  as  the  input  symbol  and  consequently  32  possible 
parses  are  set  up.  The  error  routine  requires  .1.9  seconds 
to  provide  the  solutions  for  sample  program  *U.  Recursion 
is  also  a  biq  factor  in  the  time  required,  since  a  new  set 
of  possible  parses  is  set  up  for  each  level  of  recursion  and 
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the  number  of  possible  parses  is  usually  increasing  with 
each  successive  level  of  recursion.  In  sample  program  #7, 
the  three  missing  right  parenthesis  take  a  total  of  26 
seconds  to  correct. 

While  executing  in  60K  of  real  memory,  the  results  are 
significantly  better.  Sample  program  #1  takes  1.2  seconds 
to  correct  the  missing  ";".  When  the  ";"  is  missing  from 
the  last  statement  instead  of  the  first  statement,  it  only 
takns  .9  seconds  to  correct.  Sample  program  #4,  with  the  82 
original  possible  parses,  takes  2.9  seconds  to  correct.,  and 
the  three  hissing  right  parenthesis  in  sample  program  #7 
take  a  total  of  17  seconds  to  correct. 


In  summary,  while  executing  in  30K  of  real  memory,  the 
error  routine  takes  anywhere  from  2  seconds  to  10  seconds  to 
correct  a  single  error.  (The  three  missing  right 
parenthesis  are  considered  three  errors.)  While  executing  in 
60K  of  real  memory,  the  error  routine  takes  anywhere  from 
less  than  1  second  up  to  6  seconds  to  correct  a  single 
error.  Remember  that  these  fiaures  should  only  be 
considered  as  rough  estimates,  and  that  they  are  based  on 
DEC-10  CPU  time.  A  significantly  faster  machine  would 
reguire  significantly  less  time. 
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6.3    Space    Requirements 
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An  LR  parser  im pie wen ted  with  this  error  recovery 
method  requires  considerably  more  space  than  the  same  LR 
parser  without  any  error  recovery  method.  There  are  four 
main  contributors  to  the  additional  space  requirements,  and 
they  will  be  discussed  in  the  next  several  paragraphs.  The 
"bier  four"  are  the  error  routine  code,  the  tokenized  input 
string,  the  special  tables,  and  the  error  routines  main  data 
structures.  There  are  many  other  data  structures  required 
by  the  error  routine,  but  the  sum  of  their  space 
requirements  is  neqligible  when  compared  to  the  requirements 
of  the  "biq  four". 

The  space  required  by  the  error  routine  code  is 
approximately  10K. 

The  input  strinq  which  is  saved  in  tokenized   form   can 

also   require  a  larqe  amount  of  space.  That  space  is  simply 

the  larqest  program  (in  terms  of  number  of  tokens)  which  can 
be  run  on  the  system. 


The  Leqal  State  Table  in  our  implementation  is  an  array 
of  lenqth  518  and  is  indexed  by  a n  array  of  length  27  (one 
for  each  terminal).  The  Predecessor  States  Table  is  an 
array   of   length  984  indexed  by  an  array  of  lenqth  156  (one 
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for  each  state) .  The  Predecessor  States  Table  also  includes 
another  array  of  lenqth  356  to  hold  the  symbol  each  state  is 
enterable  on.  The  TEH  States  Table  is  an  array  of  length 
17ft<l  indexed  by  an  array  of  lenqth  356  (one  for  each  state). 
The  total  space  required  for  these  three  tables  sums  up  to 
approximately  4.4K. 


Storage  space  is  needed  for  each  tpntative  correction 
of  every  possible  parse,  as  well  as  for  pointers  into  the 
tokenized  input  string  indicating  where  each  correction 
should  take  place.  These  require  two  2-dimensional  arrays, 
each  of  length  {maximum  number  of  corrections  per  possible 
parse) * (maximum  number  of  possible  parses).  Even  more  space 
is  reguired  by  the  possible  parses  themselves  and  their 
accompanying  pointers  into  the  tokenized  input  string. 
These  reguire  two  2-di mensional  arrays,  each  of  length 
(parsing  stack  limit)*  (maximum  number  of  possible  parses). 

Just  to  get  a  rough  estimate  of  the  space  reguired, 
assume  that  the  maximum  number  of  corrections  per  possible 
parse  is  restricted  to  5,  and  that  the  parsing  stack  is 
restricted  to  a  maximum  depth  of  50.  This  means  that  the 
space  reguired  by  the  error  routines  data  structures  will  be 
approximately  (2*5  ♦  2*50) * (maxi mum  number  of  possible 
parse),  which  is  egual  to  110* (maxi mum  number  of  possible 
parses) .    The  problem  is  that  the  number  of  possible  parses 
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needed  can  be  quite  large,  especially  if  several  levels  of 
recursion  are  allowed.  The  absolute  minimum  number  of 
possible  parses,  even  if  no  recursion  is  allowed,  is  still 
equal  to  the  largest  number  of  states  from  which  it  is  legal 
to  "shift"  for  any  given  input  symbol.  For  the  language 
implemented  [Appendix  A],  there  are  82  states  from  which  it 
is  legal  to  "shift"  when  the  input  symbol  is  an  identifier. 
Thus,  for  our  implementation,  the  absolute  minimum  space 
reguired  is  110*82  words  of  memory,  which  is  approximately 
10K. 


3» 


This  absolute  minimum  increases  fairly  rapidly  as  the 
number  of  levels  of  recursion  that  are  allowed  is  increased. 
The  problem  is  that  usually  more  than  one  possible  parse  is 
re-instated  immediately  prior  to  a  recursive  call  of  the 
error  routine.  For  example,  in  Sample  Program  #7  (three 
unmatched  left  parenthesis)  the  error  routine  recursively 
calls  itself  twice.  Both  recursive  calls,  plus  the  original 
call  from  the  regular  parser,  occur  on  the  input  symbol  ";". 
There  are  only  two  states  from  which  it  is  legal  to  "shift" 
on  the  input  svmbol  ";".  However,  both  possible  parses  are 
re-instated  immediately  prior  to  the  first  recursive  call. 
The  pattern  repeats  itself  and  the  first  and  second  levels 
of  recursion  have  four  and  eight  possible  parses 
respectively.  This  increase  in  the  number  of  possible 
parses  needed  for  each  successive  level  of  recursion  is  even 


73 

more  troublesome  if  the  input  symbol  is  an  identifier 
instead  of  a  ";".  Also  remember  that  in  this  example  the 
number  of  possible  parses  which  are  needed  for  each 
successive  level  of  recursion  only  doubles.  It  is  possible 
for  the  number  to  be  multiplied  by  several  times  itself  for 
each  successive  level  of  recursion. 

Remember  that  all  of  these  figures  are  based  on  our 
implementation  and  will  vary  with  the  language  that  is 
implemented. 


6.4  Ease  of  Implementation 

There  is  no  problem  in  constructing  the  special  tables 
that  are  needed  by  the  error  routine.  The  tables  are 
automatically  generated  from  the  parsing  tables. 


The  LR  parser  itself  must  be  modified  somewhat.  The 
lexical  analysis  phase  must  be  modified  to  save  each  token 
that  it  processes,  as  well  as  to  check  if  the  next  token  it 
is  supposed  to  process  has  already  been  consumed.  The 
parser  must  be  modified  to  store  a  corresponding  pointer 
into  the  tokenized  input  string  for  each  state  that  it 
pushes  onto  the  parsing  stack.   These  modifications  and   the 
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rationale      behind      them      are      described      in      more      detail    in 
Section    U.6. 
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The  Cost  Vectors  must  be  filled  in  and  a  cost  limit 
chosen.  This  can  be  done  in  a  trivial  amount  of  time,  and 
still  provide  good  results.  Our  sample  programs  produced 
good  results  and  were  run  with  a  constant  cost  for  all 
insertions  and  deletions.  More  sophisticated  costs  should 
not  take  too  much  longer  to  determine.  Por  example, 
insertions  and  deletions  of  single  character  tokens  could  be 
given  a  relatively  lower  cost,  while  assigning  a  relatively 
high  cost  to  the  deletion  of  keywords  of  the  language. 
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7.   CONCLUSIONS 


The  implementation  of  this  error  recovery  method  in  an 
actual  compiler  appears  to  be  feasible.  Host  major 
programming  languages  are  larger  than  the  language  [Appendix 
A T  on  which  the  method  was  tested.  However,  the  language  we 
implemented  is  large  enough  and  in  particular  contains 
enough  similar  constructs  to  risk  some  nondeterminism 
problems.  Thus,  we  do  not  believe  that  the  performance 
would  be  greatly  affected  by  implementing  a  larger  language. 

The  time  reguired  to  correct  each  error  is  greater  than 
is  desir^able.  However,  the  additional  time  required  can 
certainly  be  justified  in  an  environment  where  clear 
accurate  error  messages  are  essential  (i.e.,  any  environment 
where  the  programmer  frequently  has  a  minimal  knowledge  of 
the  language) .  The  experienced  programmer  would  also 
benefit  in  that  roost  errors  are  corrected,  and  the 
programmer  does  not  have  to  search  through  many  spurious 
error  messages. 


For  some  languages,  the  average  time  to  correct  an 
error  might  be  significantly  improved  by  always  skipping  the 
input  symbol  on  which  an  error  is  detected,  and  then  setting 


76 

up  the  possible  parses  for  the  immediately  following  symbol. 
For  the  case  where  the  error  is  detected  immediately,  with  a 
symbol  that  should  be  deleted  being  the  current  input 
symbol,  this  would  save  the  time  involved  in  setting  up  and 
then  discarding  the  first  set  of  possible  parses. 

A  slight  improvement  in  the  average  time  to  correct  an 
error  could  be  achieved  by  "fine  tuning"  the  cost  vectors 
and  the  cost  limit.  This  would  allow  erroneous  possible 
parses  to  be  discarded  more  guickly. 

The  biggest  problem  of  this  method  is  its  space 
reguirements.  One  possibility  is  to  obtain  additional  space 
by  not  evpn  attempting  to  execute  a  program  with  one  or  more 
syntax  errors.  If  this  was  the  policy,  then  the  space 
normally  taken  up  by  semantic  routines  could  be  used  for  the 
error  routine. 


However,  even  if  the  semantic  routine  space  were  to  be 
available  to  the  error  routine,  some  sort  of  restriction 
would  still  be  necessary  on  the  levels  of  recursion  allowed. 
The  memory  cannot  possibly  be  big  enough  to  handle  every 
hypothetical  case.  However,  those  cases  which  result  in  a 
multitude  of  possible  parses,  do  occur  relatively 
inf reguently.  Allowing  many  levels  of  recursion  is  not 
really  that  advantageous.  Festricting  the  algorithm  to  N 
levels  of  recursion  would  merely  mean  that  only   N   and   not 
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N*1  additional  errors  in  the  close  proximity  to  the  first 
error  could  be  corrected,  (Errors  in  close  proximity  are 
errors  which  are  encountered  before  the  first  error  is 
corrected.)  Still,  allowing  a  few  levels  of  recursion  is 
desirable  in  order  to  be  able  to  correct  a  few  errors  in 
close  proximity.  The  solution  to  this  problem  appears  to  be 
conditional  recursion.  An  absolute  limit  on  the  number  of 
possible  parses  allowed  could  be  set.  This  limit  would  be 
dependent  on  the  amount  of  memory  available  to  the  error 
routine.  Recursion  could  be  allowed  as  long  as  the  number 
of  possible  parses  was  under  the  absolute  limit.  This  would 
be  preferable  to  placing  a  strict  limit  on  the  number  of 
levels  of  recursion  allowed. 

Being  unable  to  execute  an  incorrect  program  is  not.  a 
significant  drawback  to  the  error  recovery  method. 
Executing  incorrect  programs  might  even  tend  to  encourage 
careless  programming.  In  any  event,  a  programmer  can  hardly 
complain  if  an  error  recovery  method  detects  and  provides  a 
helpful  error  message  for  every  error  in  a  program,  but  then 
fails  to  execute  it. 


This  error  recovery  method  does  an  excellent  job  of 
providing  a  reasonable  correction  for  most  syntax  errors  in 
a  program.  The  error  messages  provided  are  very  helpful  and 
even   the   most  inexperienced  programmer  should  not  have  any 
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trouble  interpreting  them.  One  might  even  want  to  implement 
an  LB  parser  with  this  error  recovery  method  strictly  as  a 
syntax  checker. 
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<proqram> 


=>    <declaration-list>    . 
<sta tement-list>    . 


<doclaration-list>  => 


<st  atetnent> 


<input-list> 


<output-list> 


<boolean-expr> 


=>   <declarat ion-list > 
<identif ier> 

=>    <del caration-list > 

<identifier>   f    <iigits>    ] 


<statement;-list.>         => 


=>    <sta tement-list> 
<stateraent>    ; 

=>  GOTO  <identifier> 

=>    READ    <input-list> 

=>    WRITE  <output-list> 

=  >    IF  <  boolean-expr>    THEN 

<sta*ement>    ELSE    <statewent> 

=>    <identif ier>   :    <statement> 

=>    BEGIN   <stateraent-list>    END 

=>    <assignment> 

=  > 

=>    <input-list>    <variable> 

=  > 

=>    <out put-list>    <variable> 

=>    <out put-list>    <character> 

=>    <expression>   <relational-op> 

<expression> 
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<left-part> 


<character> 


<relational-op>  =>  < 

=>  = 

<assignment>  =>  <lef t-part>   <assignnent> 

=>  <left-part>   <expression> 

=>  <identifier>  :  = 

=>  <subscripted-var>    :    = 

<subscripted-var>     =>  <identif ier>  f    <expression>    ] 

=>  <ide ntif ier>  [    <assignment>    ] 

<variable>  =>  <identifier> 

=>  <subscripted-var> 

=>  *    <identifier> 

=  >  ■    <ciigits> 

=  >    '     , 
=  >    '     . 

=  >  <tera> 

=  >  <expr^ssion>   ♦    <term> 

=>  <expression>    -    <term> 

=>  <factor> 

=  >  <terin>   *    <factor> 

=>  <tern>   /  <factor> 

=  >  <pri  mary> 

=>  <pri mary>  4*  <f actor> 


<Gxpressioi\> 


<term> 


<f  act.or> 
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<primary> 


=>  <variable> 

=  >  <digits> 

=>    <character> 

=>    (   <expression>   ) 

=  >    (    <assiqnroent>   ) 
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